Permutation. Index entries are sorted alphabetically (Figure 2.c). The index processor must differentiate among different types of keys such as strings, numbers, and special symbols. Upper and lower case letters should be distinguished. Furthermore, it may be necessary to handle roman, arabic, and alphabetic page numbers.
Merging. Different page numbers corresponding to the same
index key are merged into one list. Also, three or more
successive page numbers are abbreviated as a range (as in the
case of alpha, iv, 1-3
, Figure 2.c). If citations on successive
pages are logically distinct, good indexing practice suggests that they
should not be represented by a range. Our system allows user control
of this practice.
Subindexing. Multi-level indexing is supported.
Here, entries sharing a common prefix are grouped together
under the same prefix key.
The special symbol `!
' serves as the level operator
in the example (Figure 2.a and 2.b). Primary indexes are
converted to first level items (the \item
entries in Figure 2.c)
while subindexes are converted to lower level items
(e.g., \subitem
or \subsubitem
entries in Figure 2.c).
Actual Field. The distinction between a sort key
and its actual field is made explicit.
Sort keys are used in comparison while their actual counterparts are
what end up being placed in the printed index.
In the example, the `@
'
sign is used as the actual field operator, which means its
preceding string is the sort key and its succeeding string
is the actual key (e.g., the \index{alpha@{\it alpha\/}}
in Figure 2.a).
The same sort key with and without an actual field
are treated as two separate entries (cf. alpha
and alpha
in the example). If a key contains no actual operator,
it is used as both the sort field and the actual field.
The separation of a sort key from its actual field makes entry sorting
much easier. If there were only one field,
the comparison routine would have to ignore syntactic sugar related to
output appearance and compare only the ``real'' keywords.
For instance, in {\it alpha\/}
, the program has to ignore the font
setting command \it
, the italic correction command \/
,
and the scope delimiters {}
.
In general, it is impossible to know
all the patterns that the index processor should ignore,
but with the separation of the fields, the sort key is used as a verbatim string
in comparison; any special effect can be achieved via the actual field.
Page Encapsulation. Page numbers can be encapsulated
using the `|
' operator. In the example,
page 14 on which \index{beta}
occurs is set in boldface, as
represented by the command \bold
. The ability to set
page numbers in different fonts allows the index to convey
more information about whatever is being indexed. For instance,
the place where a definition occurs can be set in one font, its
primary example in a second, and others in a third.
Cross Referencing. Some index entries make references to
others. In our example the alphabeta
entry is a reference
to beta
, as indicated by the see phrase.
The page number, however, disappears after formatting (Step IV),
hence it is immaterial where index commands dealing with cross references
like see occur in the document.
This is a special case of page encapsulation (see{beta}
appears
after the `|
' operator).
Variations like see also, which gives page numbers as well
as references to other entries, work similarly.
Input/Output Style. In order to be
formatter- and format-independent, the index processor must be able
to handle a variety of formats. There are two reasons for considering
this independence issue in the input side:
Raw index files generated by systems
other than LATEX may not comply to the default format, and
the basic framework established for processing indexes can also be
used to process other objects of similar nature (e.g., glossaries).
But these other objects will certainly have a different keyword
(e.g., \glossaryentry
as opposed to \indexentry
)
in the very least.
Similarly in the output side the index style
may vary for different systems. Even within the same formatting
system, the index may have to look differently under different
publishing requirements. In other words, there must be a way to inform
the processor of the input format and the output style.